Online Appendix for RELEAF: An Algorithm for Learning and Exploiting Relevance

نویسندگان

  • Cem Tekin
  • Mihaela van der Schaar
چکیده

This online appendix is composed of two sections. In the first section we give the proof of Theorem 5 in [1]. The second section is an extensive version of the numerical results given in Section V of [1]. I. PROOF OF THEOREM 5 A. Preliminaries Let A := |A|. We first define a sequence of events which will be used in the analysis of the regret of RELEAF. For p ∈ PR(a),t, let π(a,p) = μ(a,xR(a)(p)), where x ∗ R(a)(p) = {x ∗ i (pi)}i∈R(a) such that xi (pi) is the type i context at the geometric center of p. Let W (R(a)) be the set of Drel-tuple of types such that R(a) ⊂ w, for every w ∈W (R(a)). We have |W (R(a))| = ( D−|R(a)| 2Drel−|R(a)| ) . For a Drel-tuple of types w, let D(w, D′) be the set of D′-tuple of types whose elements are from the set D−w. For any w ∈W (R(a)) and j ∈ D(w, Drel), let INACCt(a,w, j) := { |r̄ t (pw,t,pj,t, a)− π(a,pR(a),t)| > 3 2 L √ Drel max i∈R(a) s(pR(a),t) } , be the event that the sample mean reward of action a corresponding to the 2Drel-tuple of types (w, j) is inaccurate for action a. Let

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Online Streaming Feature Selection Using Geometric Series of the Adjacency Matrix of Features

Feature Selection (FS) is an important pre-processing step in machine learning and data mining. All the traditional feature selection methods assume that the entire feature space is available from the beginning. However, online streaming features (OSF) are an integral part of many real-world applications. In OSF, the number of training examples is fixed while the number of features grows with t...

متن کامل

A Higher Order Online Lyapunov-Based Emotional Learning for Rough-Neural Identifiers

o enhance the performances of rough-neural networks (R-NNs) in the system identification‎, ‎on the base of emotional learning‎, ‎a new stable learning algorithm is developed for them‎. ‎This algorithm facilitates the error convergence by increasing the memory depth of R-NNs‎. ‎To this end‎, ‎an emotional signal as a linear combination of identification error and its differences is used to achie...

متن کامل

A New Fuzzy Stabilizer Based on Online Learning Algorithm for Damping of Low-Frequency Oscillations

A multi objective Honey Bee Mating Optimization (HBMO) designed by online learning mechanism is proposed in this paper to optimize the double Fuzzy-Lead-Lag (FLL) stabilizer parameters in order to improve low-frequency oscillations in a multi machine power system. The proposed double FLL stabilizer consists of a low pass filter and two fuzzy logic controllers whose parameters can be set by the ...

متن کامل

Designing stable neural identifier based on Lyapunov method

The stability of learning rate in neural network identifiers and controllers is one of the challenging issues which attracts great interest from researchers of neural networks. This paper suggests adaptive gradient descent algorithm with stable learning laws for modified dynamic neural network (MDNN) and studies the stability of this algorithm. Also, stable learning algorithm for parameters of ...

متن کامل

An Online Q-learning Based Multi-Agent LFC for a Multi-Area Multi-Source Power System Including Distributed Energy Resources

This paper presents an online two-stage Q-learning based multi-agent (MA) controller for load frequency control (LFC) in an interconnected multi-area multi-source power system integrated with distributed energy resources (DERs). The proposed control strategy consists of two stages. The first stage is employed a PID controller which its parameters are designed using sine cosine optimization (SCO...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014